Lab 4: Spatial Wrangling
1 OVERVIEW
1.1 Learning Objectives
The aim of this week’s lab is to get comfortable reading in and managing spatial datasets and the SP/SF packages
SEE CANVAS FOR SUBMISSION DATES.
1.2 Get help
If a link to a tutorial is broken, you should be able to go to the tutorial number and find it in the menu.
Teams is the fastest way to get help. CLICK THIS LINK FOR THE TEAMS WEBSITE FOR LAB HELP
2 LAB SET-UP
2.1 Create a project
- Using R-CLOUD? : click here. This also has instructions on
uploading/downloading code from your computers. - https://psu-spatial.github.io/Geog364-2022/index_GEOG364-22_Tutorial_R.html#2_R-Studio_CLOUD
- Using YOUR LAPTOP? : Click here: - https://psu-spatial.github.io/Geog364-2022/index_GEOG364-22_Tutorial_R.html#3_R-Studio_Desktop
2.2 Use a template
You are welcome to use your own template, but I suggest for ease
using one of the professional ones, such as
PACKAGE rmdformats or PACKAGE prettydoc. To
use these,
- (if you have not already) click on the packages tab, then the
install button. Install the
rmdformatspackage and theprettydocpackage. - Same as normal, go to File|New File|R Markdown. But NOW,
click on the templates button on the left.
- You will see a whole load of templates from the different packages.
Each will give you professional formatting for very little work. To
explore what they look like without having to try each one, google the
websites for rmdformats, prettydoc and others..

To see what a template looks like, choose it, press OK, then press knit.
Choose your favourite. Finally, remember to add in the title and author lines at the top of your Rmd file. For example here is the final YAML for this script.

2.3 Add libraries & code options
Edit the first “set-up” code chunk so it looks like this and run/knit to load. You might need additional libraries as you work through the lab. If so, add them in this code chunk AND REMEMBER TO RERUN. If you’re template didn’t have a “setup” code chunk, just create one at the top.
If you see a little yellow bar at the top asking you to install them,click yes!
knitr::opts_chunk$set(cache = TRUE,message=FALSE,warning=FALSE,echo=TRUE)
# LIBRARIES
library(tidyverse)
library(dplyr)
library(ggpubr)
library(skimr)
library(ggplot2)
library(plotly)
library(knitr)
library(raster)
library(sp)
library(sf)
library(tmap)
library(terra)
library(palmerpenguins) #you might need to install this
library(rnaturalearth)3 CODE SHOWCASE
3.1 Spatial Data Wrangling in Data Camp [20 MARKS]
Create a code showcase section in your lab report. Complete CHAPTER 1 AND CHAPTER 2 of this datacamp course on spatial data in R. Include a screenshot in your lab report to show you did it.
See Canvas for how to access data camp for free.
3.2 Markdown and inline code
3.2.1 Learn about inline code
One of the best parts of R markdown is that you can embed code in your actual report text. So imagine for example, you had written about the mean car year in Lab 1 then realised you had made a mistake. Rather than have to change the answer in your write up, you can include “inline” code which will auto-update with as you fix the mistake.
To see it in action and learn how to do it, follow these three links:
- https://rmarkdown.rstudio.com/lesson-4.html : A nice example of how it looks
- https://bookdown.org/yihui/rmarkdown-cookbook/r-code.html : How to do it
- https://campus.datacamp.com/courses/communicating-with-data-in-the-tidyverse/introduction-to-rmarkdown?ex=10 : test your knowledge
### Test your learning [10 MARKS]
- Make a new code chunk and load the penguins dataset (from the
package palmerpenguins).
data(penguins, package="palmerpenguins")Click on penguins in the environment tab to view the data. More details here: https://allisonhorst.github.io/palmerpenguins/
In the same code chunk, find the MEAN flipper length and the MAXIMUM body mass, and save to variables called flipper and mass.
- HINT, there is missing data in these columns, to ignore it use
na.rm=TRUE (see here https://www.statology.org/na-rm/ )
- HINT, there is missing data in these columns, to ignore it use
na.rm=TRUE (see here https://www.statology.org/na-rm/ )
In the code chunk options the { r } bit, add include=FALSE so that the code chunk is invisible. When you press knit, it should look like nothing has happened.
Finally, below the code chunk, write this sentence, using inline code to replace the XXXX/YYYY with the actual average flipper length / max body mass.
In the Palmer Penguins dataset, the mean flipper length is XXXX mm and the maximum body mass is YYYY g.
4 MAIN ANALYSIS
4.1 Aim of the analysis
You are writing a report for Dr Sara Stoudt at the department of Maths in Bucknell University (for real - I am going to share these with her). Dr Stoudt is a spatial statistician who focuses on the analysis of crowd-sourced datasets. https://www.inaturalist.org/ and https://journeynorth.org. Have a look at her bio here: https://sastoudt.github.io/.
Specifically, we will be conducting an analysis on a new crowd-sourced dataset that she has never seen before (again for real), a crowd-sourced dataset on fireflies. As before, with the entire report below, we will be grading on the professionality of your output.
4.2 Report set-up
Create these headings and sub-headings in your report.
- BACKGROUND ON CROWD-SOURCED DATA
- What is crowd-sourced data?
- What are its strengths?
- What are its weaknesses?
- How do spatial fallacies impact it?
- STUDY SUMMARY
- Fireflies
- The Firefly Watch project
- DATA DESCRIPTION
- DATA WRANGLING
- SPATIAL ANALYSIS
- MAPS
- ANALYSIS
For example, when you press knit, it should look something like this:
4.3 Background on crowd sourced data.
Although GEOG-364 is not a writing course, it is important to be able to describe the data and topics you are covering. We are grading you on content not on grammar. Crowd sourced data is an important type of data which will only grow in popularity, so it’s important to understand its strengths and weaknessses.
First, BRIEFLY skim read these three papers. I start by reading the abstracts and sub-headings then zoom in where I find interesting.
- https://journals.sagepub.com/doi/10.1177/1745691619850561 (Scientific Utopia III: Crowdsourcing Science)
- https://www.pnas.org/doi/full/10.1073/pnas.2110156119 (Dr Stoudt’s most recent paper: Identifying engaging bird species and traits with community science observations)
- https://www.geog.uni-heidelberg.de/md/chemgeo/geog/gis/europ_handb_of_crowds_geog_inf_chapter6_jacobs.pdf Limitations of crowd sourcing data.
4.3.1 what to write up
In the appropriate section of your report and referring to the sources above (plus other documents as you wish), write:
- One paragraph on what crowd-sourced nature data is [5 marks]
- One paragraph on the strengths and opportunities for crowd sourced
data in understanding the natural world around us [5 marks]
- One paragraph on the weaknesses for crowd sourced data in
understanding the natural world around us [5 marks]
- One paragraph where you use the course notes to explain how the
non-uniformity of space and the locational fallacy might impact some of
these datasets [5 marks]
4.4 Background on fireflies
firefly <- readxl::read_excel("fireflydata.xlsx")Fireflies are well loved insects, yet we don’t actually have a map of where they are - or know if they are declining or increasing. For example, we don’t know how climate change, pesticides or light population are affecting their numbers.
Refresh your knowledge on fireflies (these are just ideas.. spend 5-10mins on this max)
- here: https://www.massaudubon.org/learn/nature-wildlife/insects-arachnids/fireflies/about
- or this video if you’re a visual learner: https://www.youtube.com/watch?v=Y7RI1qjB2r8
- https://www.massaudubon.org/get-involved/community-science/firefly-watch/resources
To gain more data, a group of researchers started a citizen-science project called Firefly Watch where people could submit their firefly observations. See more here:
https://www.massaudubon.org/get-involved/community-science/firefly-watch
The aim of this lab is to see if this crowd sourced dataset can show/explain spatial patterns in reported sightings of fireflies/lightning bugs.
4.4.1 what to write up
In the appropriate sections of the Study Summary part of your report,
- Introduce fireflies as a topic and explain why we might want to map
them, summarising a few facts about fireflies from your
reading.
- Introduce the Firefly Watch Study
There is a spell check next to the knit button at the top of the script. Press knit regularly to check it all looks good
4.5 Data Analysis [10 MARKS]
4.5.1 Data description
Go to the Canvas Lab 4 page and download the dataset (“firefly.xlsx”).
Put it in your lab 4 project folder (or use the upload button in R-studio cloud to put it in your project)
Use The Input/Output tutorial 61 to read the data into R and save as a variable called
firefly. (hint, readxl package)View/summarise the data and get comfortable with it. You could use some summary statistics from Summary Stats Tutorial 8. You do not need to include them all..
Write these details as a bullet point list in your data description section:
- Object of analysis of the dataset
- Population of the dataset (e.g. boundary in time and
space)
- Variables in the dataset
- Number of objects/rows
- Which years do we have data for? How many observations in each year?
(hint, apply the
table()command to the Year column of the firefly dataset) - Is Pennsylvania included in the dataset? How many observations were
taken in PA?
- In a new paragraph, explain if you think the firefly data is marked, and if so, give an example of a mark.
- Object of analysis of the dataset
See if you can include the numbers as inline code rather than typing them..
HINT, there is not one row for every firefly that has ever existed in the USA.. think about what each row is
HINT 2, WE SHOULD BE ABLE TO SEE IN YOUR CODE WHERE YOU GOT EACH ANSWER (e.g. leave your code visible)
4.5.2 Data wrangling
In the data wrangling section, use either this tutorial FILTERING Tutorial 7D or or https://crd150.github.io/lab2.html#Filtering to help you complete these tasks
Use R-code to find the value of the second row and the 4th column in your data
In the MAIN firefly data, if you look closely at your summary, you might find there are some unusual temperature values.
.Let’s assume that the temperature of 8000F is not likely to be true. Filter the data so that the temperature is below 200F and overwrite (e.g. save the result as a variable called firefly)
4.5.3 Making your data spatial
If your datacamp/training gives you a better way of approaching this code, go for it!
These tutorials should help Tutorial 11A, along with the data camp course. Tutorial 11B
The firefly data is in standard lat/lon, so EPSG=4326.
Use your own knowledge or these instructions to Make a sf version of your firefly data and assign it to a variable called firefly.sf. You can leave it in lon/lat/4326 this lab.
- Use Tutorial 11Bc to load rnaturalearth state-boundaries for US States. Assign to a variable called states.sf and use st_transform to convert to projection 4326.
4.6 Making maps [5 MARKS]
Let’s now see how our data looks plotted. I have provided a few examples. Your job is to get them running and interpret them.
4.6.0.1 Make a basic plot
In a new code chunk enter the following code. You should see a basic plot with the firefly locations and the state borders. If so, congrats! If not, you need to adjust your projections or something has happened.
plot(st_geometry(firefly.sf),
pch=16,
col=rgb(0,0,1,.5),
cex=.5,
main="Firefly locations")
plot(st_geometry(states.sf),add=TRUE)
]Recreate this plot in your report. Google the rgb()
command and edit your plot so that the points are semi-transparent
purple. (hint https://www.r-graph-gallery.com/43-rgb-colors.html )
Hint, you can also use tmap and QTM from the previous labs to explore the data.
4.6.0.2 Make a more detailed tmap plot
The plot above is still pretty basic, so lets explore another of the big packages available to let you make spatial visualisations. We’re going to extend your knowledge of tmap.
Look at the command below, you can see that we’re building a series of layers linked by the + symbol.
tmap_mode("plot") # Set the static plot mode
myplot <- tm_shape(firefly.sf) + # Load the firefly data
tm_dots(col="black", size=0.05) + # Plot it as dots
tm_shape(states.sf) + # Load the state borders
tm_borders(lwd=.5) # Plot them as just borders
myplot
I have saved it as a variable called myplot and printed its name so that it’s saved in R. This means I can now turn on the interactive view mode and re-plot
tmap_mode("view")
myplot